Supporting Research Data Collection from YouTube with TubeKit
نویسنده
چکیده
We present TubeKit, a query-based YouTube crawling toolkit. This software is a collection of tools that allows one to build one’s own crawler that can crawl YouTube based on a set of seed queries and collect up to 17 different attributes. TubeKit assists in the phases of this process starting with database creation to finally giving access to the collected data with browsing and searching interfaces. We further demonstrate how we used this toolkit to collect elections related data from YouTube for nearly two years. Some analysis of the collected data relating to the elections is also given.
منابع مشابه
A Survey of Current YouTube Video Characteristics
raffic produced by YouTube has had a significant impact on both fixed and mobile networks. The study and evaluation of YouTube content features can benefit network traffic engineering by supporting the development of sustainable video delivery services and regulation of network traffic. Such evaluations are particularly useful to network operators who aim to refine and optimize existing cache a...
متن کاملLinked Data Collection and Analysis Platform of Audio Features
Audio features extracted from music are commonly used in music information retrieval (MIR), but there is no open platform for data collection and analysis of audio features. Therefore, we build the platform for the data collection and analysis for MIR research. On the platform, we represent the music data with Linked Data. In this paper, we first investigate the frequency of the audio features ...
متن کاملYOUStatAnalyzer: a tool for analysing the dynamics of YouTube content popularity
Understanding the dynamics of on-line content popularity is an active research field with application in sectors as diverse as media advertising, content replication and caching and on-line marketing. In most cases, scientists have focused on user-generated contents, which are freely accessible through different on-line services. Among such services, the incumbent one is indeed YouTube. This on...
متن کاملReverse Engineering the Youtube Video Delivery Cloud
In this paper we set out to “reverse-engineer” the YouTube video delivery cloud by building a globally distributed active measurement infrastructure. Through careful and extensive data collection, analysis and experiments, we deduce the key design features underlying the YouTube video delivery cloud. The design of the YouTube video delivery cloud consists of three major components: a “flat” vid...
متن کاملCrowd-Sourced Amputee Gait Data: A Feasibility Study Using YouTube Videos of Unilateral Trans-Femoral Gait
Collecting large datasets of amputee gait data is notoriously difficult. Additionally, collecting data on less prevalent amputations or on gait activities other than level walking and running on hard surfaces is rarely attempted. However, with the wealth of user-generated content on the Internet, the scope for collecting amputee gait data from alternative sources other than traditional gait lab...
متن کامل